µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
Â÷¼¼´ë ¿°±â¼¿ °áÁ¤±â¿¡¼ »ý¼ºµÈ ¸®µåÀÇ È¿°úÀûÀÎ Á¤·ÄÀ» À§ÇÑ ¼Ò¼ö ±â¹ÝÀÇ Çؽà ¾Ë°í¸®Áò°ú Ŭ·¯½ºÅ͸µ ¹æ¹ý |
¿µ¹®Á¦¸ñ(English Title) |
Prime Number based Hash Algorithm and Clustering Approach for Effective Alignment of Reads from Next Generation Sequencing |
ÀúÀÚ(Author) |
°æ±ÔÈ£
¹ÚÄ¡Çö
¿©À±±¸
¹Ú»óÇö
Kyuho Kyung
Chihyun Park
Yunku Yeu
Sanghyun Park
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 28 NO. 02 PP. 0037 ~ 0053 (2012. 08) |
Çѱ۳»¿ë (Korean Abstract) |
À¯Àüü ¿°±â¼¿ Á¤·ÄÀº À¯Àüü ¿¬±¸¿¡¼ °¡Àå ±âº»ÀûÀÌ°í ÇÙ½ÉÀûÀÎ ¹®Á¦·Î ¾à 30¾ï °³ÀÇ ¿°±â¼¿ ¹®ÀÚ·Î ±¸¼ºµÈ ·¹ÆÛ·±½º Áö³ð ½ÃÄö½º¿¡ ¿°±â¼¿ Á¶°¢À» ºñ±³ Á¤·ÄÇÏ¿© ¸ÊÇεǴ À§Ä¡¸¦ Ž»öÇÏ´Â ¹æ¹ýÀÌ´Ù. ƯÈ÷ ÃÖ±Ù Â÷¼¼´ë ¿°±â¼¿ ºÐ¼®(Next Generation Sequencing) ±â¼úÀÌ ¹ßÀüÇÏ¸é¼ »ý¼ºµÈ ´ë·®ÀÇ ÂªÀº ¸®µå(read)¸¦ ºü¸£°í Á¤È®ÇÏ°Ô ·¹ÆÛ·±½º Áö³ð ½ÃÄö½º¿¡ Á¤·ÄÇÒ ¼ö ÀÖ´Â ¹æ¹ý¿¡ ´ëÇÑ ¿¬±¸°¡ ¼öÇàµÇ°í ÀÖ´Ù. ªÀº ¸®µå Á¤·Ä ¾Ë°í¸®ÁòÀº ´ë·®ÀÇ µ¥ÀÌÅ͸¦ ºü¸£°í Á¤È®ÇÏ°Ô ¸ÊÇÎÇØ¾ß Çϱ⠶§¹®¿¡ ¼Óµµ¿Í Á¤È®µµ¿¡ ÁÖ¿äÇÑ Àǹ̸¦ µÎ°í ÀÖÁö¸¸ µÎ ¿ä¼Ò »çÀÌÀÇ Æ®·¹À̵å¿ÀÇÁ(trade-off) °ü°è ¶§¹®¿¡ µÎ °¡Áö ¸ðµÎ¸¦ ¸¸Á·ÇÏ´Â ¾Ë°í¸®ÁòÀ» ¸¸µé±â¶õ ¸Å¿ì ¾î·Æ´Ù. º» ¿¬±¸¿¡¼´Â ¼Ò¼ö¸¦ ÀÌ¿ëÇÏ¿© A, C, G, T, NÀ¸·Î ÀÌ·ç¾îÁø ¿°±â¼¿À» È¿°úÀûÀ¸·Î Ç¥ÇöÇÒ ¼ö ÀÖ´Â »õ·Î¿î Çؽà ¹æ¹ýÀ» Á¦¾ÈÇÏ°í ¹Ì½º¸ÅÄ¡(mis-match)¸¦ °í·ÁÇÒ ¼ö Àִ Ŭ·¯½ºÅ͸µ(clustering) ¹æ¹ý°ú ºñÆ®(bit) º¯È¯À» Àû¿ëÇÏ¿© Á¤È®ÇÏ°í ºü¸£°Ô ¿°±â¼¿À» Á¤·ÄÇÒ ¼ö ÀÖ´Â ¾Ë°í¸®ÁòÀ» Á¦½ÃÇÑ´Ù. Á¦¾ÈÇÏ´Â ¹æ¹ýÀÇ ¿ì¼ö¼ºÀ» °ËÁõÇϱâ À§ÇØ NCBIÀÇ ½ÇÁ¦ Àΰ£ ¿°»öü¸¦ ·¹ÆÛ·±½º ½ÃÄö½º·Î »ç¿ëÇÏ¿´°í, µ¿ÀÏ ½ÃÄö½º¸¦ ÀÌ¿ëÇÏ¿© ¸¸µç ½Ã¹Ä·¹ÀÌƼµå µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿© BWA¿Í Á¦¾ÈÇÏ´Â ¹æ¹ý¿¡ ´ëÇÑ ºñ±³ ½ÇÇèÀ» ¼öÇàÇÏ¿´´Ù. °á°úÀûÀ¸·Î Á¦¾ÈÇÏ´Â ¹æ¹ýÀÌ ºñ±³ ¾Ë°í¸®Áò°ú ºñ±³ÇÏ¿© ´õ ³ôÀº Á¤È®µµ¿Í ´õ ³·Àº ¿À·ùÀ²ÀÌ È®ÀÎ µÇ¾ú´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Sequence alignment which maps DNA sequence fragment into reference genome sequence composed of 3 billion nucleotides is basic and fundamental problem in genomic. With the advent of next generation sequencing(NGS) machine and developing the technology, the researches which can fast and accurately align a large amount of short reads into reference genome have been studied. Because an alignment method has to map short reads into reference fast and accurately, both of speed and accuracy are major factor. However, they are in the trade?off relation and it is difficult to make an algorithm which satisfies both two factors. In this paper, we propose the novel hash method which can present the genomic fragment composed of A, C, G, T and N with prime number. We also propose the clustering approach which can consider the mis?match and bit transformation approach for enhancing alignment speed. To verify the superiority of our method, we used the real genome sequence published by NCBI as a reference data and obtained the simulated data from that. We compared the performance with BWA algorithm using the simulated data. The results showed that our method had higher accuracy and lower error rate than the ones of comparative method.
|
Å°¿öµå(Keyword) |
Â÷¼¼´ë ¿°±â¼¿ ºÐ¼®
¿°»öü Á¤·Ä
Çؽà ¾Ë°í¸®Áò
ºñÆ® º¯È¯ ¹× ¿¬»ê
¼Ò¼ö
½ÖµÕÀÌ ¼Ò¼ö
»çÃÌ ¼Ò¼ö
ºò µ¥ÀÌÅÍ
ÆÐÅÏ
Next Generation Sequencing
Genome Alignment
Hash Algorithm
Bit Transformation
Twin Prime
Cousin Prime
Cousin Prime
Big Data
Pattern
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|